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ABSTRACT 

Collaborative filtering (CF) is one of the most popular ap- 
proaches to build a recommendation system. In this paper, 
we propose a hybrid collaborative filtering model based on 
a Makovian random walk to address the data sparsity and 
cold start problems in recommendation systems. More pre- 
cisely, we construct a directed graph whose nodes consist of 
items and users, together with item content, user profile and 
social network information. We incorporate user's ratings 
into edge settings in the graph model. The model provides 
personalized recommendations and predictions to individu- 
als and groups. The proposed algorithms are evaluated on 
MovieLens and Epinions datasets. Experimental results show 
that the proposed methods perform well compared with other 
graph-based methods, especially in the cold start case. 

Index Terms — Recommendation system, random walk, 
social networks, hybrid collaborative filtering model 

1. INTRODUCTION 

Over the last decade, the commercialization of early gener- 
ations of recommendation systems achieved great success. 
Recommendation systems serve as an important component 
of online retail and Video on Demand (VoD) services such as 
Amazon and Netflix [ 1] . Recommenders typically provide the 
target user a list of customized recommendations through col- 
laborative filtering or content-based filtering. Intensive work 
has been done to improve the performance of both of these 
techniques. Traditional recommendation systems assume that 
users are independent, and recommendations are given ac- 
cording to users' explicit or implicit rating history and/or item 
content information^ ||3|. Problems such as data sparsity, 
cold start, and shilling attack still challenge the design of 
recommendation systems [3]. User profile and social infor- 
mation, on the other side, provides extra information on user 
preference. This information is especially helpful in the case 
of giving recommendations to a new user with little or no rat- 
ing history. The emergence of e-commerce and online social 



networks provides us a good opportunity to integrate user so- 
cial information into the recommendation model, so as to im- 
prove the recommendation results or to alleviate the cold start 
problem H0. 

Collaborative filters use the known preferences of users 
to make recommendations or predictions to a target user. 
Memory-based collaborative filtering uses the entire user- 
item database to calculate the similarity value between users 
or items, and then a weighted sum is taken as a prediction 
for the target user on a certain item. See, for example, Grou- 
pLens 0. Model-based approaches such as Bayesian Belief 
Net CF [7| and regression-based CF [8 1 learn a complex pat- 
tern from training data and use the model to predict a user's 
preference. The most related work are 1 2 1 1 9 1 1 1 1 1 1 1 1 . Fouss, 
et al. [2 1 suggested a dissimilarity measure between nodes 
of a graph, the expected commute time between two nodes, 
which the authors applied to collaborative filtering. Specially, 
they constructed an indirected bipartite graph where nodes 
are users and movies. A link is placed if the user watched 
that movie. Movies are then ranked in an ascendending order 
according to the average commute time to the target node. 
Gori et al [9| built their graph model by only using items as 
nodes. In (9), two nodes are connected if at least one user 
rated both nodes. The weight of the edge is set as the number 
of users who rated both of the nodes. A random-walk based 
algorithm is then used to rank items according to the target 
user's preference record. In [10|, the authors combine the 
trust based and collaborative filtering approaches for recom- 
mendation. Target users take a finite-step random walk on 
a trust network, so as to use the ratings by trusted users to 
assist prediction. More recently, Bogers [ 1 1 1 proposed Con- 
textWalk, a collaborative filtering method to include different 
types on contextual information by taking random walks over 
the contextual graph. 

In this paper, we propose a random walk based hybrid 
collaborative filtering model that incorporates the social in- 
formation of users. It is shown in lfl2ll that a random walk ap- 
proach is very effective in link prediction on social networks. 
Inspired by [ 12 1 and 1 13], we create a recommendation graph, 
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Fig. 1. Hybrid collaborative filtering graph example. 

as shown in Fig. [T] consisting of items, users, item genres, and 
user profile information as nodes. Similar to PageRank, the 
stable distribution resulting from a random walk on the graph 
is interpreted as a ranking of the nodes for the purpose of 
recommendation and prediction. The structure of the collab- 
orative filtering part of the recommendation graph is similar 
as the graph proposed in and |fljj| in the means of connect- 
ing the user u node and item node i if there is a rating record 
of u on i. Unfortunately, in ifTTI . the author did not provide 
experimental result to evaluate the performance, and the edge 
settings for constructing the network are not clear. In 0, the 
authors assigned unit weight for the edges in the graph which 
cannot capture the user preference effectively. The expected 
commute time between item and user nodes was taken as the 
similarity measure to make recommendation. In Q and IfTTI . 
the authors only gave a list of recommended items; no rating 
prediction is available. In this paper, the edges of the graph 
is related to user rating score instead of simply being set to a 
unit value. Apart from the collaborative filtering graph which 
only contains user rating information, we add user social pro- 
file and social network information, which makes it possible 
to provide customized recommendation to new users even if 
no previous rating information is available. The main contri- 
bution of this paper is: (1) we propose a hybrid collaborative 
filtering graph model incorporating user social network, user 
profile information, together with item content and user-item 
rating history together to give recommendations; (2) we de- 
scribe in detail the construction and edge weight assignment 
which reflect user preferences effectively; (3) we extend the 
application of the recommendation algorithm to group recom- 
mendation; (4) we design experiments on multiple data sets to 
evaluate the performance of proposed algorithm. 

In a typical setting, there is a list of m users U = 
{ui, U2, u m } and a list of n items X = {ii, %2, i n }. 
Each user uj has a list of items I Uj , that the user has rated or 
from which the user's preference can be inferred. The ratings 



can either be explicit, for example, on a 1-5 scale as in Net- 
flix, or implicit such as purchases or clicks. These data form 
a to x n rating matrix R, where R U i denotes the rating of 
user u on item i. Assume that binary tagging and user social 
information is given. Let T = {ti, t^, ife} be the set of 
tagging information of items. For example, for movies, T can 
be genre, main actor, release date, etc. Tj g {0, l} k denotes 
the features of item i, where k is the total number of tags. 
Correspondingly, let V = {pi , P21 ■■■ , pi } be the set of user 
profile information, including age, occupation, gender, etc.. 
P u € {0, 1}' denotes the profile features of user u, where 
I is the dimension of the features of all users. S = (U,£ s ) 
contains social network information, represented by an undi- 
rected or directed graph, where U is a set of nodes and £ s is 
a set of edges. For all u,v £ U, (u, v) £ £ if v is a friend of 
u. We want to make recommendations for a target user or a 
group of users given the above information. 

The rest of this paper is organized as follows. We pro- 
pose our random walk based recommendation model in Sec- 
tion [2] The performance of the proposed model is evaluated 
in Sectionp] followed by conclusions and acknowledgements 
in Section|4]and|5] 

2. A HYBRID COLLABORATIVE FILTERING 
MODEL BASED ON RANDOM WALKS 

In this section, we will describe our algorithm in detail, in- 
spired by Google's PageRank. Specially we describe how to 
construct the graph and make recommendations. 

PageRank lfT3l calculates a probability distribution repre- 
senting the likelihood that a web surfer randomly clicking on 
web links will arrive at any webpage. A similar approach can 
be used for movie recommendation. Every time a user has 
watched a movie, the system may show some more movies 
that other users who like this one also like. As in PageRank, 
there is a damping factor to indicate that the movie watcher 
may finally stop browsing. Now the key issue is how to con- 
struct this recommendation graph and represent flow on the 
graph. 

2.1. Graph construction 

2.1.1. Graph settings 

Let Q = {V,£} be a directed graph model for CF, where 
V :=UUlUTWP. The nodes of the graph consist of users, 
items, item information and user profiles. For v i: Vj £ V, 
(vi, Vj) £ £ if and only if there is an edge point from vi to Vj, 
which is determined as given below. The weight are specified 
in the next subsection. 

• For u £ U , i £ X, (u, i) £ £ and (i, u) £ £ if and only 
if R U i 7^ 0, i.e., an item i and a user u are connected if 
there is a rating records of user u on item i, with weight 
w ui and w iu . 



• For i E I,t E T, (i,t) E £ and (t, i) € £ if and only if 
Jjj 7^ 0, i.e., the item i and tag t are connected if i is 
tagged by t, with weight wa and lu^. 

• For u E U,p E V, (u,p) E £ and (p,u) E £ if and 
only if Pi p) ^ 0, i.e., a profile feature p and a user u are 
connected if the user u belongs to the profile category 
p, with weight w up and w pu . 

• For u±,U2 E U, (111,112) £ £ if and only if (ui, 112) E 
£ s , with weight w UlU2 . Note that the relationship in 
social networks is not necessarily mutual, it could be 
a unilateral relationship such as in TwitteiQ epinion^] 
etc. 



2.2.2. Rank score computation 

For the recommendation graph Q = {V,£}. Let v = \V\ 
denote the number of nodes on the graph, m is a v x 1 cus- 
tomerized probability vector. 

= e u , (4) 

where e\, e2, e v are the standard basis of column vectors. 
/3 is a damping factor. With probability 1 — /3, the random 
walk is teleported back to node u. The rank score s satisfies 
the following equation: 

s = /3Ws + (1 - P)6, (5) 



2.7.2. Edge weight assignment 

The main part of our rank graph is the collaborative filter- 
ing graph, which includes the user nodes, item nodes and the 
edges between them. The weights of edges in the collabora- 
tive filtering graph can be assigned as follows: 



Wui = w iu = exp , (1) 



2~2iei u r « 



(2) 



where I u denote the set of items which user u has rated. Note 
that a larger edge weight indicates more chance that the ran- 
dom walk passes through that edge. If user u's rating on item 
i r U i is lower than the average rating f u , w U i and Wi U are 
less than 1; otherwise are greater than 1. The assignment of 
weights do not depend on the variance of the user's ratings. 

For the extended graph, i.e. nodes and edges containing 
item content, user profile or social network information, we 
simply assign an edge weight of 1 if an edge is present. 

2.2. Rank score computation 

2.2.1. Random walk on a weighted graph 

A random walk is a Markov process with random variables 
X\,X2, ...jXt, ... such that the next state only depends on 
the current state. For a random walk on a weighted graph, 
Xt+i is a vertex chosen according to the following probability 
distribution: 



P tj := P(X t+1 = j\X t =%) = . 



(3) 



where Mi are the neighbors of i, Mi := E £}. As 



mentioned in Section 2.1.2 a higher weight indicates a higher 
chance that the random walk moves through that edge. 
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where W is the weighted transition matrix with Wij = Pji. 
So we have, 



s = (PW + (1 - P)6\ T )s := Ms 



(6) 



Hence the rank score is the principal eigenvector of M, which 
can be computed by iterations fast and easily as shown be- 
low: 

sf ] <- i for all i 
t=l 

while |sW - s^- 1 ^ < edo 
for i = 1 to v do 

end for 

t<-t + l 
end while 

Similar to PageRank, the rank score s is interpreted as 
the importance of other nodes to the target user u. It is easy 
to see that we can increase the rank score by shortening the 
distance, adding more paths, or increasing the weight on the 
path to u. These are desired properties in a recommendation 
system. For example, even if item i is not directly connected 
with u, but it is in the category to which many of u's highly 
rated items belong, i is very likely to have a high rank score. 
Or if both user u and v! have similar opinions on a variety 
of items, vl will have high rank, so we can use it"s explicit 
ratings to make recommendations and predictions for u. 

2.3. Recommendation 

2.3.1. Direct method 

Solving Equation <|5j iteratively, we have a rank score of all 
nodes of the recommendation graph Q. Since the rank score 
represents the importance to the target user, we then sepa- 
rate and sort them according to the categories, i.e. users U, 
items I, tags T etc. Sorted items excluding I u form a rec- 
ommendation list to the target user u. We can compute the 
recommendation for every user. 



2.3.2. User-based recommendation 

Similar to memory-based collaborative filtering which uses 
Pearson correlation |6) as a similarity measure between users 
and items, we use rank score as an influence measure to make 
predictions. Given the rank score of the user set U, we take 
the weighted sum of users' ratings on item i as a prediction 
for the target user u, as shown in Equation UJ): 

-user 2~2xeU t s x( r xi ~ r x ) _ 



2^xGUi * x 

s x is the target user's personalized rank score of user x, 
2.3.3. Item-based recommendation 



(7) 



As in Section 2.3.2 in order to perform an item-based recom- 
mendation, we can use the rank score of item set I as weight 
to predict the rating of item i for the target user u. As shown 
in Equation dSl 



E 



E 



(8) 



jeiu 



In Equation ([H} we use it's rating on similar items to predict 
the rating on i. Sj is the target user's personalized rank score 
of item j. 

2.4. Incremental computation 

In practice, the rating information and user's social informa- 
tion evolves. The recommendation graph changes when a new 
rating record is input, a new item is on sale, a new user reg- 
isters, or even when a user changes his profile. Thanks to the 
popularity of PageRank, incremental computation of PageR- 
ank has been studied intensively [ 14][ 15 1. It is shown in |[T5l 
that with a reset probability of e, the total work needed to 
maintain an accurate estimate of the PageRank of every node 
at all times is 0( n \ m ) in a network with n nodes, and m 
edges. Since it is beyond the scope of this paper, we do not 
address technical details for this problem. 

2.5. Discussions 

2.5.1. Recommendations for groups 

Because of the special structure of the rank graph, we can 
naturally extend the recommendation for individual users to 
groups. Note that in order to give recommendations for indi- 
viduals, we set the personalized vector in Section |2.2.2| as e u . 
Similarly, for recommendation for a group of users u, we can 
set the personal vector 9 as 



1,11 X/ 



(9) 



The rest of the predictions are same as described in the previ- 
ous sections. 
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Fig. 2. User rating distribution of Epinions and MovieLens 
datasets. 



2.5.2. Dealing with "cold" users and "cold" items 

A great challenge to recommendation systems resulting from 
data sparsity is the cold start problem, namely, the question 
of how to effectively give recommendations to new users. A 
naive approach is to provide the same recommendation to ev- 
eryone. Studies show that two persons connected via a social 
relationship tend to have similar tastes, which is known as the 
"homophily principle" lfl6"l . The availability of online social 
network offers us extra information about new users. Given 
the social network information, if a new user is connected 
with other nodes in our recommendation graph, we can then 
make personalized recommendation for the "cold" user even 
if we do not have any rating information from this user. Sim- 
ilarly, for "cold" items we connect a new item in the recom- 
mendation graph according to its tagging information, so that 
we can then recommend the "cold" item to users. Experimen- 
tal results are shown in Section[3] 



3. EXPERIMENTS 

3.1. Data sets 

In order to evaluate the performance of the proposed algo- 
rithm, we run experiments on Epinions and MovieLens data 
sets, both of which are widely used benchmarks for recom- 
mendation systems. Epinions is a website where users can 
post their reviews and ratings (1-5) on a variety of items 
(songs, softwares, TVs, etc.), as long as user's web of trust, 
i.e. "reviewers whose reviews and ratings they have consis- 
tently found to be valuable" ifTTl . We randomly select 946 
items, 973 users and their trust network from Epinion data 
sets IfTTl to perform the experiments. The MovieLens data 
sets consists of 1682 movies and 943 users. Movies are la- 
beled by 19 genres. User profile information such as age, 
gender and occupation is also available. User rating distribu- 
tions and histograms of ratings per user for both data sets are 
shown in Fig. [2] and Fig. [3] 
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Fig. 3. Histogram of ratings per user of Epinions and Movie- 
Lens datasets. 



3.2. Experimental methodology and results 

We evaluate our results with two popular evaluation metrics 
for top-fc recommendations: recall and percentile. 

Recall: In the top-fc recommendations, we consider any 
item in the top-fc recommendations that match any items in 
the testing set as a "hit", as in 1 18 1. 



Fig. 4. Epinion data sets top-k recall. 
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Fig. 5. MovieLens data sets top-k recall. 



where T is the size of testing set. A higher recall value indi- 
cates a better prediction. 

Percentile: The individual percentile score is simply the 
average position (in percentage) that the item in the test set 
occupies in the recommendation list. For example, if four 
items are ranked 1st, 9th, 10th and 20th in a recommendation 
list consisting of 100 items, the percentile score is 0.1. A 
lower percentile indicates a better prediction. 

In this experiment, the test set T contains all the 5-star 
rating records, thus we can consider them as relevant items 
for recommendation. The recommendation list has a length 
of 500 items for Epinions data sets and 900 for MovieLens 
data sets. We compare our methods UserRank CF (without 
social information) and UserRank in Section[2]with two state- 
of-art collaborative filtering methods L+ [2 J and ItemRank |9 1 
described in Section Q] 

Experimental results of recall score are shown in Fig. [4] 
and Fig. [5] We can see that UserRank has a higher recall score 
in both data sets compared with baseline methods. However, 
in a "warm start" scenario, adding social information does 
not change the performance much. In Table 1 and Table 2, 
we compared the percentile value for both warm start and 
cold start cases. It is worth noting that social information 
improves the performance of UserRank considerably in "cold 
start" case. 



4. CONCLUSIONS 

In this paper, we present a hybrid collaborative filtering model 
based on a random walk for recommendation systems. It in- 
corporates item content and user social information to make 
recommendations and predictions for target users. Social in- 
formation improves the "cold start" performance when lack- 
ing user rating information. Experiments are performed on 
two standard real-world data sets. The experimental results 
shows that the proposed method performs well compared to 
other state-of-art collaborative filtering methods. 
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Table 1. Average percentile results obtained by 5-fold cross- 
validation for warm-start recommendation. 
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Table 2. Average percentile results obtained by 5-fold cross- 
validation for cold-start recommendation. 
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